[Picture Gallery]
 

"Training for Optical Character Recognition and Khmer Language Processing"
   
Overview

Cambodia component of PAN Localization project successfully built preliminary language processing applications in Phase I of project. It was time to move ahead and develop some more intricate tools for Khmer NLP e.g. Text to Speech system, Optical Character Recognition and Mobile SMS. But certain training was required for the team to design such applications.

Unluckily the senior members of PAN team went for higher education at the same time and were replaced by relatively inexperienced and fresh team. Although all of the new team members of PAN Cambodia were qualified computer scientists but they didn’t have much experience of NLP application development. It took a while for them to understand the working environment, and especially for more complex computing.

   
Objectives

 

The prime objective of the training was to equip Cambodia team with basic local language computing technologies. It was also required to guide them in fulfilling their commitments to the project and finish the work in time. Team received the training starting from basic programming skills and leading towards advanced application development. Broadly training was comprised of following objectives:

  • Khmer OCR

  • Open office plug-ins of Khmer applications

  • Khmer collation support for MySQL

  • Automatic POS tagger for Khmer

  • Khmer Lexicon development

  • Khmer Tagged Corpus

  • Khmer SMS software for Java based mobile phones

  • Workshops on Khmer Application training for Students and University Staff

Following tasks were planned to help the team in achieving the objectives of PAN Localization project:

Technical Tasks:

  • Programming skills development in C++

  • Fundamentals of Digital Image Processing

  • Modular understanding of OCR system

  • Introduction to Artificial Intelligence

  • Open Office embedding

  • Part of Speech Tagging techniques

  • Collation embedding in MySQL

  • Development on Mobile Platform

Non Technical Tasks:

  • Training for project planning and management

  • Technical report writing

   
Challenges

The prime challenge, faced during the training, were the sustainability of work and capacity building of human resources. The PAN team of phase I was well versed for language processing applications but there was not much overlapping time for the new team to absorb some skills. In addition to that Image processing was a new filed for those fresh developers and also they needed some warm up exercises for robust error free programming. It took a while (First 1 and a half month of training) for them to be hands on with image handling and artificial intelligence processing.

   
Trainer Profile

Trainer Name: Ahmed Muaz
Ahmed Muaz is a graduate of computer science (2007) from FAST-NU, Lahore and currently enrolled in Master program. His area of specialization is Natural Language Processing. He has been part of PAN Localization regional secretariat team since 2005, and worked on different sub-projects. Currently he is serving as Associate Development Engineer in CRULP.

   
Training Participants

Ms. Khem Sochenda
Ms. Sophea Vann
Mr. Ing LengIeng
Mr. Tith Sakal
Mr. Oudom Keo
Mr. Sovathena Neth
Mr. Visal
   
Discussion and Conclusion

 

Training was very successful as it achieved all planned goals. In success of the training significant role was played by Cambodia Country Leader (Mr. Chea Sok Huor) who facilitated team and trainer with all required resources. All planned objectives were achieved. Newly hired team lead and manager were guided and trained to accomplish tasks in time. During the later half of training, one seminar and two workshops were conducted as well.

 

[Picture Gallery]